Continuous Similarity Computation over Streaming Graphs

نویسندگان

  • Elena Valari
  • Apostolos N. Papadopoulos
چکیده

Large network analysis is a very important topic in data mining. A significant body of work in the area studies the problem of node similarity. One way to express node similarity is to associate with each node the set of 1-hop neighbors and compute the Jaccard similarity between these sets. This information can be used subsequently for more complex operations like link prediction, clustering or dense subgraph discovery. In this work, we study algorithms to monitor the result of a similarity join between nodes continuously, assuming a sliding window accommodating graph edges. Since the arrival of a new edge or the expiration of an existing one may change the similarity between several node pairs, the challenge is to maintain the similarity join result as efficiently as possible. Our theoretical study is validated by a thorough experimental evaluation, based on real-world as well as synthetically generated graphs, demonstrating the superiority of the proposed technique in comparison to baseline approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Approximation-based Streaming Skylines for Similarity Search Query

Actually, large database is not simply considered as a stream database because of streaming data is not only containing huge data volumes, but distributed, continuous, rapid, time varying. Therefore, the general techniques may not suit for streams exactly. Accuracy responses required of approximated answers is more important in stream processing for the similarity search. Therefore, we perform ...

متن کامل

Intractability of min- and max-cut in streaming graphs

We show that the exact computation of a minimum or a maximum cut of a given graph G is out of reach for any one-pass streaming algorithm, that is, for any algorithm that runs over the input stream of G’s edges only once and has a working memory of o(n) bits. This holds even if randomization is allowed.

متن کامل

Time Constrained Continuous Subgraph Search over Streaming Graphs

The growing popularity of dynamic applications such as social networks provides a promising way to detect valuable information in real time. Efficient analysis over high-speed data from dynamic applications is of great significance. Data from these dynamic applications can be easily modeled as streaming graph. In this paper, we study the subgraph (isomorphism) search over streaming graph data t...

متن کامل

Continuous Spatiotemporal Trajectory Joins

Given the plethora of GPS and location-based services, queries over trajectories have recently received much attention. In this paper we examine trajectory joins over streaming spatiotemporal data. Given a stream of spatiotemporal trajectories created by monitored moving objects, the outcome of a Continuous Spatiotemporal Trajectory Join (CSTJ) query is the set of objects in the stream, which h...

متن کامل

READS: A Random Walk Approach for Efficient and Accurate Dynamic SimRank

Similarity among entities in graphs plays a key role in data analysis and mining. SimRank is a widely used and popular measurement to evaluate the similarity among the vertices. In real-life applications, graphs do not only grow in size, requiring fast and precise SimRank computation for large graphs, but also change and evolve continuously over time, demanding an efficient maintenance process ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013